Supervised All-Words Lexical Substitution using Delexicalized Features
نویسندگان
چکیده
We propose a supervised lexical substitution system that does not use separate classifiers per word and is therefore applicable to any word in the vocabulary. Instead of learning word-specific substitution patterns, a global model for lexical substitution is trained on delexicalized (i.e., non lexical) features, which allows to exploit the power of supervised methods while being able to generalize beyond target words in the training set. This way, our approach remains technically straightforward, provides better performance and similar coverage in comparison to unsupervised approaches. Using features from lexical resources, as well as a variety of features computed from large corpora (n-gram counts, distributional similarity) and a ranking method based on the posterior probabilities obtained from a Maximum Entropy classifier, we improve over the state of the art in the LexSub Best-Precision metric and the Generalized Average Precision measure. Robustness of our approach is demonstrated by evaluating it successfully on two different datasets.
منابع مشابه
Language Transfer Learning for Supervised Lexical Substitution
We propose a framework for lexical substitution that is able to perform transfer learning across languages. Datasets for this task are available in at least three languages (English, Italian, and German). Previous work has addressed each of these tasks in isolation. In contrast, we regard the union of three shared tasks as a combined multilingual dataset. We show that a supervised system can be...
متن کاملLexical Substitution for the Medical Domain
In this paper we examine the lexical substitution task for the medical domain. We adapt the current best system from the open domain, which trains a single classifier for all instances using delexicalized features. We show significant improvements over a strong baseline coming from a distributional thesaurus (DT). Whereas in the open domain system, features derived from WordNet show only slight...
متن کاملUSYD: WSD and Lexical Substitution using the Web1T corpus
This paper describes the University of Sydney’s WSD and Lexical Substitution systems for SemEval-2007. These systems are principally based on evaluating the substitutability of potential synonyms in the context of the target word. Substitutability is measured using Pointwise Mutual Information as obtained from the Web1T corpus. The WSD systems are supervised, while the Lexical Substitution syst...
متن کاملExplorations in lexical sample and all-words lexical substitution
In this paper, we experiment with several techniques to solve the problem of lexical substitution, both in a lexical sample as well as an all-words setting, and compare the benefits of combining multiple lexical resources using both unsupervised and supervised approaches. Overall in the lexical sample setting, the results obtained through the combination of several resources exceed the current ...
متن کاملرویکردی با ناظر در استخراج واژگان کلیدی اسناد فارسی با استفاده از زنجیرههای لغوی
Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. S...
متن کامل